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In the context of corrosion engineer- 
ing it is often natural to be concerned 
with extreme events. This is because, 
firstly, it is these extreme events that 
often lead to failure and, secondly, it 
may only be possible to measure the 
extremes, with much of the underlying 
measurements by their very nature un- 
observablc Statistical methods relating 
to extreme value theory can be used to 
model and predict the statistical be- 
haviour of extremes such as the largest 
pit, thinnest wall, maximum penetration 
or similar assessment of a corrosion 
phenomenon. These techniques can be 
applied to the single largest value, or 
to a given number of the largest values, 



measured over individual areas or cou- 
pons; or to all values exceeding a given 
threshold. The data can be modeled to 
account for dependence on environ- 
mental conditions, surface area exam- 
ined, and the duration of exposure or 
of experimentation. The application of 
a selection of these techniques is 
demonstrated on data from industry 
and from laboratory experiments. 
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1. Introduction 

Extremes are typically defined in two ways. Ei- 
ther by selecting a suitable threshold and then 
recording every observation above that threshold; 
or by sorting the data, according to some a priori 
sampling scheme, so as to select the one, two, or 
three, etc., largest value(s). The nature by which 
the extremes are defined and hence measured is 
then indicative of the techniques appropriate for 
modeling and prediction. Most of the statistical 
methods relating to extreme values are based, in 
the first instance, on the assumption of an underly- 
ing large sample of possible measurements, all 
nominally arising from a single population of such 
possible measurements. For extreme value theory 
to be used, it is then only necessary for the actual 
extremes to be measured. The other possible mea- 
surements can be ignored and may even be unob- 



servable with the equipment used to measure the 
extremes. The nature of the extreme may be that 
of a maximum value or a minimum value. In this 
paper we will assume that maximum values are of 
interest. In applications concerned with minima, 
negating the variable of interest will transform the 
problem into one concerned with maxima. 

The generalized Pareto distribution (GPD) is 
the standard family of statistical distributions to be 
used as a basis for modeling data which arise as 
exceedances over some threshold. Applications of 
this approach for the first of the above extreme 
value definitions is examined in the following sec- 
tion. Methods to ensure the validity of the standard 
statistical assumptions while accumulating such 
data are discussed. The generalized extreme value 
(GEV) distribution can be shown to be the natural 
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one to use for single extremes. Data can arise as 
the largest value from each of a set of coupons (in- 
dividual specimens), or from partitioning an area 
into equal smaller areas and selecting one maxi- 
mum from each smaller area. The application of 
methods considering such single extremes is also 
considered. The joint generalized extreme value 
distribution (JGEV) is the appropriate distribution 
family to use when the r (say) largest values are 
extracted, instead of just the single largest value. 
This provides a useful extension to the classical 
theory in such a way as to match up with the com- 
mon practice of measuring the few largest pits at 
any one location undergoing pitting. Using the r 
extreme order statistics in this way can increase the 
precision of the estimates in the model and hence 
improve predictions. 

Dependence on time and area can be incorpo- 
rated for prediction and extrapolation purposes 
when applying these distributions, and methods for 
modeling the dependence on environmental condi- 
tions, say, through covariates are indicated. 



2. Exceedances Above a Threshold 

These are data collected on the basis of all val- 
ues exceeding a specified threshold, taken suffi- 
ciently "high" to imply that certain limiting 
statistical results will hold. The data in Table 1, on 
pit depths in two stainless steel roofs, were col- 
lected with just such a threshold, namely 6 ^m, in 
operation. This threshold qualifies as "high" on the 
basis that a much lower one, such as 0.Q6 u.m for 
example, would have produced a very much larger 
sample of nascent pits. This is consistent with theo- 
ries of pitting in steel and other metals. See further 
argument supporting this approach in Ref. [1]. This 
type of data censoring can arise through built in 
limits on measurement capabilities or else through 
deliberate censoring of a given data set, typically a 
dense time series, so as to isolate the important 

Table 1. Pit depths above 6 jun in stainless steel sheet college 
roofs {area 500 m 2 ; samples 10 cm 2 ; thickness 400 u.m) 

Roof 1 (50 months) 

131 106 35 26 26 25 23 20 20 18 18 18 17 16 16 IS 15 15 14 14 

14 14 14 14 14 14 14 12 12 12 12 12 10 10 8 8 S 8 8 8 8 8 « 

Roof 2 (29 months) 

140 t06 95 77 72 55 55 53 52 36 33 32 32 30 28 28 26 26 25 24 

24 24 22 22 20 18 18 16 16 16 16 14 14 12 12 12 8 8 8 



events. When such data are extracted from a regu- 
lar grid of values rather than through the engineer 
visually identifying isolated corrosion phenomena 
and taking one measurement on each, it may be 
necessary to edit the values so as to extract only 
local cluster maxima rather than using all nearby 
points. This is needed to "decouple" the recorded 
values and so validate the usual assumption of 
statistical independence or exchangeability. A care- 
ful combination of grid size (to match the scale of 
the phenomena being studied) and threshold (to 
select for significant phenomena) may be all that is 
necessary. 

With this form of data set, both the number, n , 
of observations and their observed values {y,} are 
necessarily random variables. It can be shown, see 
for example Ref. [2], that, for sufficiently high 
thresholds, and for a wide variety of initial distribu- 
tions, this number, n, of the exceedances, has 
asymptotically a Poisson distribution (with parame- 
ter A, say) and their sizes, y, have a generalized 
Pareto distribution: 



G(y) = l-(l + &/<r) 



■hi 



(1) 



valid for 1 + iy/tr > 0, with o > and — «> <£< °° . 
In particular, if these distributional results hold ex- 
actly for some particular threshold, u say, then the 
maximum of this set of values has a generalized 
extreme value distribution (see next section) ex- 
actly, and this will be true for all higher thresholds. 
A check that the distribution, Eq. (1), holds can be 
made by graphing the mean excess plot, in which 
the mean exceedances in the data are plotted 
against increasing threshold values. This plot 
should follow a straight line with slope f/(l — f) 
and intercept er/(l — £); with a horizontal plot cor- 
responding to f = and a simple exponential distri- 
bution for the tail. For extrapolation over larger 
areas, for extremes derived from random sampling 
over a large structure, often the quantity of interest 
is the Nth return level 

?v=«-|[l-(ANy], 

where N is either the number of "coupon multi- 
ples'* as a measure of structure size, or else the 
number of time intervals into the future. The Nth 
return level is interpreted as that level which would 
be exceeded on average once every N units of area 
(or time). 
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The data in Fig. 1(a) are 1024 values of "current 
noise" collected during a study of the electrochem- 
ical nature of pitting. This series was "declustered" 
using a moving window of width 40 to give the iso- 
lated maxima in Fig. 1(b). A mean excess plot for 
the isolated maxima of the current noise data is 
given in Fig. 1(c). Consideration of this plot sug- 
gests that either a large threshold is required or 
that the exceedances arise from a mixture of the 
tails of underlying distributions. For an electro- 
chemical interpretation of this latter phenomenon, 
it can be noted that large narrow current spikes 
have been described as being typical of intermit- 
tent pitting corrosion, while steady broader based 
but less variable current noise has been associated 
with general corrosion, see for example Ref. [3]. 
Intermediate conditions can be associated with 
persistent pitting, widely recognized as the most 
threatening scenario for metal structures. 
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Fig. 1(a). Current noise measurements (sample size=1024). 
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Fig. 1(b). Isolated peaks in current noise measurements. 
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Fig. 1(c). Mean excess plot for current noise measurements. 

The main difficulty which can arise with the 
threshold method is the choice of an appropriate 
threshold, especially when there is no a priori rea- 
son for choosing one particular threshold over an- 
other. In an experiment to consider the prediction 
of extreme corrosion rates for carbon steel in a 
simulated basalt groundwater [4], a number of 200 
mm x 200 mm coupons were exposed for varying 
lengths of time. These coupons, having been first 
cleaned to remove all corrosion products, were 
profiled with spot heights taken at the nodes of a 1 
mm lattice. This then gave, after making an adjust- 
ment for the original coupon surface, a 196x196 
array of corrosion measurements. False-color his- 
togram-equalization techniques, displayed on com- 
puter monitors, were used to validate and inspect 
the digitized spot heights from these coupons. A 
mean excess plot for a typical coupon exposed for 
26 weeks is shown in Fig. 2(a). Note that this plot 
was drawn for both the raw exceedances and also 
for declustered exceedances. The process of 
declustering essentially amounted to identifying all 
those "pits" or clusters exceeding a particular 
threshold and calculating the maximum ex- 
ceedance for each "pit." The mean excess plot in- 
dicates that a range of possible thresholds (300 
H.m-550 u.m) would be appropriate for model fit- 
ting. Table 2 gives the results for such model fitting 
using maximum likelihood for a range of values of 
threshold. Here A is the mean exceedance rate per 
m 3 , er, and £ are the parameter estimates for the 
GPD, and qn and ^uo are those levels that would 
be exceeded once on average every m 3 and every 10 
m 3 respectively. Standard errors are given in brack- 
ets. If the <7;u is considered, we see that its esti- 
mated value decreases as the threshold increases, 
its value being highly sensitive to the value of f. 
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Table 2. Summary of model fitting and prediction using maximum likelihood for the generalized Pareto distribu- 
tion for a typical 26 week basalt groundwater coupon profile 



Threshold 



Mean cluslcr 

excccdance 

(u.m) 



Number of 
clusters 



?:s 



17K> 



300 


99 


177 


4425 


98.0 


0.01 


1158 


1406 








(333) 


(11) 


(0.08) 


(260) 


(430) 


350 


92 


14fi 


3650 


99,0 


0.04 


1205 


1500 








(302) 


(11) 


(0.09) 


(300) 


(527) 


400 


97 


96 


2400 


104.3 


-0.08 


1004 


1102 








(245) 


(16) 


(0.11) 


(214) 


(322) 


450 


83 


76 


1900 


83.4 


-0.01 


1057 


1233 








(218) 


(H) 


(0.13) 


(241) 


(405) 


500 


90 


50 


1250 


102.6 


-0.14 


963 


1037 








(177) 


(23) 


(0.17) 


(213) 


(309) 


550 


87 


29 


725 


108.5 


-0.13 


918 


961 








(135) 


(12) 


(0.31) 


(250) 


(339) 



For higher thresholds the large negative value of £ 
is indicative of a tail distribution which is shorter 
than exponential so implying lower return values. 
For lower thresholds the tail appears to be expo- 
nential implying relatively higher return values. 
This effect can be seen further in an exponential 
probability plot of the exceedances above 300 (i.m, 
Fig. 2(b). As the threshold increases more weight is 
given to the extreme observations, which are them- 
selves smaller than would be expected for an expo- 
nential tail. The lack of an objective method for 
determining the correct threshold therefore leads 
to difficulties in prediction. 



120 



E 

-^ 

o 

c 

TO 



o 

c 

OS 

£ 



100 
80 
60 
40 
20 





200 400 600 

threshold/u.m 



800 



Fig. 2(a). Mean excess plot for typical 26 week basalt ground- 
water coupon profile: O — mean declustercd exceedances; □ — 
mean of ail exceedances. 
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Fig. 2(b). Exponential probability plot of declustered ex- 
ceedances above 300 u.m. 

3. Extreme Value Distributions 

Data suitable for this type of analysis can arise as 
the largest value from each of a set of coupons, or 
from dividing an area into equal smaller areas and 
selecting one maximum from each smaller area, 
provided the scale of division and corrosion pat- 
terns are compatible in the sense described above 
for the generalized Pareto distribution. For a sam- 
ple of independent identically distributed random 



316 



Volume 99, Number 4, July-August 1994 

Journal of Research of the National Institute of Standards and Technology 



variables, x\,...x n , the distribution of x^a, the data 
maximum, depends on n. Suppose however that 
there exist location and scale factors, a„ and b n say, 
so that the rescaled variate.y =a„ +b,X{„), has a dis- 
tribution which is independent of n. This is the so- 
called "stability postulate," and leads immediately 
to the following functional equation (to be solved 
for F): F(xy = F{a n +b^). The solution to this 
equation is the generalized extreme value (GEV) 
distribution, which can be written in the following 
3-parameter form: 



F(x) = exp{ - [1 + f(x - u.)/<A] " l % 



(2) 



See for example Ref, [5]. Note also that if the as- 
sumption of independence is relaxed, under gen- 
eral conditions the distribution, Eq. (2), is still the 
appropriate one for maxima. It turns out that al- 
most all standard distributions satisfy the stability 
postulate asymptotically, although it is only exactly 
true for the GEV distribution itself. This is exactly 
analogous to the Central Limit Theorem for aver- 
ages, which is satisfied asymptotically by almost all 
standard distributions, but only holds exactly for an 
initial Normal distribution. As with averages, which 
are assumed Normal, by the Central Limit Theo- 
rem, and then fitted accordingly, so with maxima, it 
is reasonable to assume a GEV distribution and fit 
accordingly. Since the dependence of the stability 
coefficients, a„, b„, on n is typically logarithmic, or 
slower, we can extract maxima from samples which 
are roughly the same size. In engineering practice 
this is often almost unverifiable, but nevertheless a 
plausible assumption, since the bulk of the data, 
"too small to be seen," may be uncounted, let 
alone observed. The physical size of components 
and common conditions may be the only justifica- 
tion. 

For extrapolation over larger areas (for extremes 
derived from random sampling over a large struc- 
ture) or over longer time periods (for extremes 
derived from sampling at regular intervals of lime), 
the N\h return level can be defined by solving 
F(x) = l~l/N. Again N is interpreted as in the 
previous section. Alternatively, after fitting the dis- 
tribution to the given data, the implied distribution 
of extreme values from future samples over larger 
areas and longer lengths of time (with equal base 
populations) can be deduced and properties such 
as the mean extreme, etc., inferred from this more 
fundamental approach. For a full discussion see 
Ref. [1], However, the return period method is par- 



ticularly easy to implement for type I extreme value 
probability plots. For examples of these plots ap- 
plied to pit depths in steels exposed to marine envi- 
ronments see Refs. [6,7]. The parameters can also 
be regressed on covariates as appropriate, to allow 
for dependence on measured environment vari- 
ables and/or time, see for example Ref. [8]. A more 
subtle approach for modeling covariates would use 
an extreme value regression model of the sort con- 
sidered in the context of the Weibull distribution 
f9]. 

In Ref. [10] each of five circular coupons were 
exposed to a corrosive medium for each of four 
different exposure times: 1000 h, 3000 h, 5000 h, 
and 8000 h. The maximum pit depth was measured 
in each of six equal sectors on each specimen. 
Nominally this gave 120 pit depths in all, however, 
for many coupons, pits overlapped into a number 
of sectors and so the number of independent max- 
ima was significantly reduced. Figure 3 shows a 
plot of maximum pit depth against exposure time 
for resulting data. The plotted mean function and 
upper bound are based on the fitting of a 4- 
parameter time dependent GEV distribution for 
which fM=nt fi , \p, = tpt fi and f is constant. This 
model gives 

tM = 0.912( ± 0.063)*" .ft = 0.293( ± 0.037)* ' 



/3=0.298(± 0.051) 



2.0 T 



f=-0.216(*0.121). 




act 



0.0 
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Fig. 3. Maximum pit depths against lime for Carbon slcel in 
alkaline conditions along with fitted mean function (--*■), up- 
per bound ( ) and confidence curves for the upper bound 

<---)■ 
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is ty 



The corresponding mean function 
[0+-fr(l-£)]f"=Tj/'', which agrees with the com- 



|T(] 



mon assumption made in the corrosion literature 
of a power law growth of the mean maximum pit 
depth with time [8,11,12]. The implied upper 
bound is then 0, = &"=(jU.-^/f)f / '. Such means 
and bounds can be extrapolated out to larger areas 
of exposed metal and to longer time periods using 
the methods described in Ref, [1], Standard errors 
on the upper bound were calculated by reparame- 
terizing the problem and constructing a profile 
likelihood for 9, as in Ref. [2]. The negative value 
for the shape parameter f has been observed by the 
authors of this paper consistently for corrosion 
phenomena of many types and in many environ- 
ments. This has important consequences for ex- 
trapolation since, in corrosion engineering return 
levels are often very large (e.g., it may only be pos- 
sible to inspect a small number of one meter sec- 
tions of a buried pipeline which may be hundreds 
of kilometers in length), and so for the range of 
values of | encountered by the authors, the maxi- 
mum will be very close to the upper bound or end 
point of the distribution. This should be contrasted 
with the commonly used £ =0, type I extreme value 
distribution, [6-8,11] for which there is no upper 
bound. 



4. Extreme Order Statistics 



the few largest pits at any one location undergoing 
pitting. Using all this information rather than just 
the single largest extreme enables smaller confi- 
dence bands to be drawn around predicted values. 
However care is needed to ensure that r is not 
taken so large as to invalidate the choice of the 
asymptotic distribution, Eq. (3). 

When f = 0, this model reduces to the Gumbe) 
form of the JGEV with density 

I 
f(xi*2,~ x r ) = ip 'exp{ - exp[ - -(x, - ix)] 



- * -rfa 



A)}- 



(4) 



A useful diagnostic here is the joint Gumbe! plot. 
When X{})5: ...2>X{f) have density, Eq. (4), 
E(x il) ) = f i-iM>(i) (all lseisSr) [14], where 4>(') is 
the digamma function. Thus a plot of the order 
statistics^;) against — <M0 will give a straight line 
with slope <p and intercept jj. if the Gumbel form of 
the JGEV distribution is appropriate. Such a plot 
is shown in Fig. 4 for each of the pitted college 
roofs data in Table 1. This plot indicates that these 
extremes arise from perhaps a mixture of two tail 
distributions. However it was assumed that f =0 
for both roofs and that for roof 1, the two largest 
values were to be outliers from the model, Eq. (4). 
These two values were removed for the purpose of 
analysis, and the slopes and intercepts resulting 



There is a corresponding asymptotic result con- 
cerning the joint distribution of the r largest values, 
x mKf =xn)^...^X(r), from a sample of independent 
identically distributed random variables. Data will 
in general then consist of m sets of such largest 
values. The joint generalized extreme value distri- 
bution (JGEV) has density 



f(x lr x 2 ,..jc r ) = *"'exp{-[1 +|(*, - m)]-" 1 
-(i + l)2log[l+|(*,-M)]}, 



(3) 



valid for §Xj>£ft- tp = £Q, >}t > Q(j — l,—,r). See for 
example Ref. [13]. This is the appropriate distribu- 
tion to use when the r (say) largest values are ex- 
tracted from coupons or sampled areas, instead of 
just the single largest value. This provides a useful 
extension to the classical theory in such a way as to 
match up with the common practice of measuring 
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used as starting values for determining the maxi- 
mum likelihood estimates of the parameters in Eq. 
(4). The fitted values, with their standard errors, 
were 

p = 54.2 ( ± 7.9) ty = 12.5 ( ± 2.1), roof 1, 

M = 103.2 ( ± 15.8) ^ = 26.0 ( ± 4.2), roof 2. 

These values are then available for the implied 
Gumbel distribution of the maximum value, which 
has mean (i + 0.5772^. This gives 61.4 y.m for roof 
1 and 118.2 u.m for roof 2. Extrapolation could now 
proceed according to the method described in the 
previous section, noting however that the mean of 
the maximum for roof 1 is considerably out of line 
with the observed maximum of 131 ^m. 

Reference [15] reports on an experiment where 
15 low alloy steel specimens were suspended in a 
deionized warm water bath under free corrosion 
conditions. Specimens were removed at varying in- 
tervals up to 71 days, then after cleaning, pit 
depths and diameters were measured optically. A 
4-parameter JGEV distribution incorporating a 
power law dependence on time [16] was fitted to 
these pit-depths, utilizing the two largest pits from 
each side of the specimens giving parameter values: 

H~ 7.041(2: 0.710)r' ^ = 0.467{± 0.066)/ *, 

P = 0.609( ± 0.016) £ = - 0.5 1 3( ± 0. 126). 

These are the maximum likelihood estimates for 
their data, for which they were only, at that time, 
able to report initial probability weighted moment 
and regression estimates. Figure 5 shows a plot of 
this data along with the fitted mean function and 
upper bound, and confidence curves for the upper 
bound calculated using the profile likelihood 
method discussed in the previous section. 



5. Discussion 

A number of statistical techniques relating to ex- 
treme value theory have been described and 
demonstrated on selected sets of corrosion data. 
Noting that much corrosion data are inherently of 
an extreme nature, purely statistical considerations 
along the lines described in this paper may be the 
only means of determining numerical values for 
prediction of the maximum pit depth in an area A 
at time f , for example, along with some estimate of 
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Fig. 5. First and second largest pit depths against time for low 
alloy steel in deionized warm water, along with fitted mean 

function (-•-}, upper bound ( ) and confidence curves for 

the upper bound ( ). 

precision or possible error. There is much evidence 
in the literature that f <0 for the GEV distribution 
in the context of extremes of corrosion phenomena. 
Return levels are often very large and so, for the 
range of values of £ encountered, predicted max- 
ima will often be very close to the implied upper 
bound or end point of the distribution. 

It should be noted however, that with all the 
methods described here, there are pitfalls. When 
modeling exceedances, for example, it is difficult to 
choose the threshold objectively, and different 
thresholds can lead to different predictions. Similar 
problems exist in the use of the r largest order 
statistics and also the maximum itself. How many 
largest order statistics should be used? When 
recording single maxima, how large should the 
sampled area be? While some theoretical results 
are available to answer such questions (e.g., Ref. 
[17]) these are not very helpful in a practical con- 
text. 
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